Improving Document Ranking using Query Expansion and Classification Techniques for Mixed Script Information Retrieval

نویسندگان

  • Subham Kumar
  • Anwesh Sinha Ray
  • Sabyasachi Kamila
  • Asif Ekbal
  • Sriparna Saha
  • Pushpak Bhattacharyya
چکیده

A large amount of user-generated transliterated contents in Roman scripts are available in the Web for the languages that use non-Roman based indigenous scripts. This creates a mixed script space which is mono-lingual or multilingual having more than one script. Information retrieval (IR) in the mixed-script space is challenging as both query and documents can be written in either native or Roman script, or may be in both the scripts. Moreover, due to lack of any standard ways of spelling a word in a non-native script, transliterated contents can be written with different spelling variations. In this paper, we propose the effective techniques for query expansion and query classification for mixed-script IR. The proposed techniques are based on deep learning, word embedding and traditional TF-IDF. We generate our own resources for creating the test-bed for our experiments. Extensive empirical analyses show that our proposed methods achieve significantly better performance (20.44% increase in MRR, 22.43% increase in NDCG@1 & 15.61% increase in MAP) over a state-of-the-art baseline model.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

QEA: A New Systematic and Comprehensive Classification of Query Expansion Approaches

A major problem in information retrieval is the difficulty to define the information needs of user and on the other hand, when user offers your query there is a vast amount of information to retrieval. Different methods , therefore, have been suggested for query expansion which concerned with reconfiguring of query by increasing efficiency and improving the criterion accuracy in the information...

متن کامل

Fujitsu Laboratories Trec7 Report 2 System Description 2.1 Overall 2.2 the Search System Teraa

1 Abstract In our rst participation in TREC, our focus was on improving the basic ranking systems and applying text clustering techniques for query expansion. We tested a variety of techiniques including reference measures, passage retrieval, and data fusion for the basic ranking systems. Some te-chiniques were used in the oocial run, others were not used because of time limitations. We applied...

متن کامل

An Efficient Information Retrieval System Using Query Expansion and Document Ranking

Information retrieval is the process of searching and retrieval of information from documents that matches user query. The user information requirement is represented by a query or profile that contains one or more search terms. Indexing plays important role to retrieve the information. Researchers have been used indexing techniques only for document indexing and not focused on the speed up the...

متن کامل

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016